Introduction

The survival of humanity has high dependency on the existence of natural resources.There has however resulted in the need to improve the management of these resources as well as strengthen the knowledge base on the interactions between humans and the earth through resources available to them.The understanding of these interactions and variations of vegetation (NDVI) and land cover therefore becomes very essential source of information because it tend to highlight more spatial activities in the environment. Key data analysis techniques will be employed on the cleaned dataset which have the same coordination systems as the land cover/land use (MCD12Q1) data.

Required libraries

This project will utilize the libraries below to conduct spatial data analysis and visualization.

setwd("C:/Users/charl/OneDrive/Desktop/DSCI 607 Data Analytics for Environmental Sciences")
library(sf)
## Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
library(sp)
library(raster)
library(viridis)
## Loading required package: viridisLite
library(terra)
## terra 1.7.65
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::extract() masks terra::extract(), raster::extract()
## ✖ dplyr::filter()  masks stats::filter()
## ✖ dplyr::lag()     masks stats::lag()
## ✖ dplyr::select()  masks raster::select()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Data Description

The data set used for this project has same coordination systems as the landcover(MCD12Q1) data. The MODIS Land Cover Type Product (MCD12Q1) supplies global maps of land cover at annual time steps and 500-m spatial resolution for 2001-present. The product contains 13 Science Data Sets (SDS;Table 1), including 5 legacy classification schemes (IGBP, UMD, LAI, BGC, and PFT; Tables 3- 7) and a new three layer legend based on the Land Cover Classification System (LCCS) from the Food and Agriculture Organization (Tables 8- 10; Di Gregorio, 2005; Sulla-Menashe et al., 2011). Also included are a Quality Assurance (QA) layer, the posterior probabilities for the three LCCS layers, and the binary land water mask used by the product. MCD12Q1 has been Stage 2 Validated based on cross-validation of the training dataset used to create the maps. Two data set are used in this project namely; the landcover and NVDI datasets.

Load Required Dataset

The data set required for this project work will be loaded into R

#######Loading Dataset1 ###############
Landcover_NDVI_Rainfall_filter <- readRDS("C:/Users/charl/OneDrive/Desktop/DSCI 607 Data Analytics for Environmental Sciences/Landcover_NDVI_Rainfall_filter.rds")
head(Landcover_NDVI_Rainfall_filter)
## # A tibble: 6 × 6
## # Groups:   Month, x, y [6]
##   Month         x        y Landcover MeanNDVI Rainfall
##   <fct>     <dbl>    <dbl> <fct>        <dbl>    <dbl>
## 1 1     -7699562. 4206648. Savannas    0.121      99.9
## 2 1     -7699562. 4207111. Savannas    0.0598     99.9
## 3 1     -7699562. 4207574. Savannas    0.0140     99.9
## 4 1     -7699099. 4206184. Savannas    0.109      99.9
## 5 1     -7699099. 4206648. Savannas    0.211      99.9
## 6 1     -7699099. 4207111. Savannas    0.153      99.9
class(Landcover_NDVI_Rainfall_filter)
## [1] "grouped_df" "tbl_df"     "tbl"        "data.frame"
names(Landcover_NDVI_Rainfall_filter)
## [1] "Month"     "x"         "y"         "Landcover" "MeanNDVI"  "Rainfall"
dim(Landcover_NDVI_Rainfall_filter)
## [1] 4857113       6
#######Loading Dataset2 ###############
NDVI_timeseries_Month <- readRDS("C:/Users/charl/OneDrive/Desktop/DSCI 607 Data Analytics for Environmental Sciences/NDVI_timeseries_Month.rds")
head(NDVI_timeseries_Month)# first 6 observations
##   Year JulianDay      NDVI Month  YearMonth
## 1 2013         1 0.1092190     1 2013-01-01
## 2 2013        17 0.3531177     1 2013-01-01
## 3 2013        33 0.3280171     2 2013-02-01
## 4 2013        49 0.2745036     2 2013-02-01
## 5 2013        65 0.2579496     3 2013-03-01
## 6 2013        81 0.3314169     3 2013-03-01
class(NDVI_timeseries_Month)
## [1] "data.frame"
names(NDVI_timeseries_Month)# names of columns used in the dataset
## [1] "Year"      "JulianDay" "NDVI"      "Month"     "YearMonth"
#str(NDVI_timeseries_Month) # structure of dataset
dim(NDVI_timeseries_Month) # dimension of the data set
## [1] 230   5
unique(NDVI_timeseries_Month$Year) # checking the different years within the dataset
##  [1] 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022

Descriptive Statistics

Below is the descriptive analysis of the data set used for this project work. This study employs the R statistical software to conduct descriptive analysis on both data set coupled with checking the presence of any missing cases found within the projected dataset. The Landcover dataset has initial 4857113 observations and 6 variables. The NDVI dataset has 230 initial observations coupled with 5 variables.

Summary of Landcover Dataset

Below is a computation of the basic statistics(minimum, maximum, mean, median) of all numeric variables found in this dataset.From the output below it can be suggested that the minimum x and y coordinate is -7699562 and 4201551 respectively.

summary(Landcover_NDVI_Rainfall_filter)
##      Month               x                  y          
##  7      : 406160   Min.   :-7699562   Min.   :4201551  
##  9      : 406141   1st Qu.:-7454470   1st Qu.:4344252  
##  8      : 406135   Median :-7341885   Median :4441547  
##  6      : 405966   Mean   :-7350710   Mean   :4440260  
##  10     : 405949   3rd Qu.:-7243663   3rd Qu.:4539770  
##  5      : 405405   Max.   :-7037489   Max.   :4640309  
##  (Other):2421357                                       
##                         Landcover          MeanNDVI         Rainfall      
##  Savannas                    :3360759   Min.   :0.0000   Min.   :  4.425  
##  Deciduous needleleaf forests: 486789   1st Qu.:0.3305   1st Qu.: 75.326  
##  Mixed forests               : 332414   Median :0.4694   Median :100.695  
##  Permanent wetlands          : 242807   Mean   :0.5096   Mean   :105.544  
##  Closed shrublands           : 180853   3rd Qu.:0.7054   3rd Qu.:134.437  
##  Grasslands                  : 161332   Max.   :0.9761   Max.   :245.911  
##  (Other)                     :  92159
#A summary ofthe landcover dataset by landcover type
by(Landcover_NDVI_Rainfall_filter, Landcover_NDVI_Rainfall_filter$Landcover, summary)
## Landcover_NDVI_Rainfall_filter$Landcover: Evergreen needleleaf forests
##      Month           x                  y          
##  4      : 24   Min.   :-7571688   Min.   :4269195  
##  5      : 24   1st Qu.:-7555009   1st Qu.:4285874  
##  6      : 24   Median :-7517944   Median :4323866  
##  7      : 24   Mean   :-7487870   Mean   :4340228  
##  8      : 24   3rd Qu.:-7467906   3rd Qu.:4340545  
##  9      : 24   Max.   :-7210304   Max.   :4629189  
##  (Other):135                                       
##                         Landcover      MeanNDVI         Rainfall     
##  Evergreen needleleaf forests:279   Min.   :0.0001   Min.   :  8.69  
##  Evergreen broadleaf forests :  0   1st Qu.:0.2810   1st Qu.: 87.41  
##  Deciduous needleleaf forests:  0   Median :0.5393   Median :122.57  
##  Deciduous broadleaf forests :  0   Mean   :0.4964   Mean   :119.55  
##  Mixed forests               :  0   3rd Qu.:0.7139   3rd Qu.:146.32  
##  Closed shrublands           :  0   Max.   :0.8729   Max.   :231.18  
##  (Other)                     :  0                                    
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Evergreen broadleaf forests
##      Month           x                  y          
##  1      : 52   Min.   :-7571688   Min.   :4266415  
##  2      : 52   1st Qu.:-7543426   1st Qu.:4272438  
##  4      : 52   Median :-7521650   Median :4336839  
##  5      : 52   Mean   :-7452320   Mean   :4357886  
##  6      : 52   3rd Qu.:-7410919   3rd Qu.:4370313  
##  7      : 52   Max.   :-7066677   Max.   :4628726  
##  (Other):304                                       
##                         Landcover      MeanNDVI          Rainfall      
##  Evergreen broadleaf forests :616   Min.   :0.00045   Min.   :  6.045  
##  Evergreen needleleaf forests:  0   1st Qu.:0.33806   1st Qu.: 87.581  
##  Deciduous needleleaf forests:  0   Median :0.57660   Median :114.028  
##  Deciduous broadleaf forests :  0   Mean   :0.53075   Mean   :116.286  
##  Mixed forests               :  0   3rd Qu.:0.74777   3rd Qu.:146.196  
##  Closed shrublands           :  0   Max.   :0.88250   Max.   :233.292  
##  (Other)                     :  0                                      
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Deciduous needleleaf forests
##      Month              x                  y          
##  4      : 40569   Min.   :-7692613   Min.   :4209428  
##  5      : 40569   1st Qu.:-7528600   1st Qu.:4280778  
##  6      : 40569   Median :-7469296   Median :4331742  
##  7      : 40569   Mean   :-7467562   Mean   :4333825  
##  8      : 40569   3rd Qu.:-7428061   3rd Qu.:4365564  
##  9      : 40569   Max.   :-7047681   Max.   :4640309  
##  (Other):243375                                       
##                         Landcover         MeanNDVI          Rainfall      
##  Deciduous needleleaf forests:486789   Min.   :0.00185   Min.   :  5.417  
##  Evergreen needleleaf forests:     0   1st Qu.:0.46130   1st Qu.: 86.549  
##  Evergreen broadleaf forests :     0   Median :0.70995   Median :120.657  
##  Deciduous broadleaf forests :     0   Mean   :0.66534   Mean   :117.640  
##  Mixed forests               :     0   3rd Qu.:0.85970   3rd Qu.:147.056  
##  Closed shrublands           :     0   Max.   :0.95215   Max.   :245.911  
##  (Other)                     :     0                                      
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Deciduous broadleaf forests
##      Month           x                  y          
##  4      :148   Min.   :-7645355   Min.   :4234910  
##  5      :148   1st Qu.:-7618483   1st Qu.:4250663  
##  6      :148   Median :-7608290   Median :4260855  
##  7      :148   Mean   :-7544252   Mean   :4308428  
##  8      :148   3rd Qu.:-7530917   3rd Qu.:4344252  
##  9      :148   Max.   :-7056484   Max.   :4640309  
##  (Other):876                                       
##                         Landcover       MeanNDVI          Rainfall      
##  Deciduous broadleaf forests :1764   Min.   :0.00215   Min.   :  6.045  
##  Evergreen needleleaf forests:   0   1st Qu.:0.57970   1st Qu.: 95.409  
##  Evergreen broadleaf forests :   0   Median :0.69750   Median :124.352  
##  Deciduous needleleaf forests:   0   Mean   :0.67031   Mean   :119.769  
##  Mixed forests               :   0   3rd Qu.:0.81060   3rd Qu.:147.002  
##  Closed shrublands           :   0   Max.   :0.92635   Max.   :231.176  
##  (Other)                     :   0                                      
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Mixed forests
##      Month              x                  y          
##  5      : 27707   Min.   :-7692613   Min.   :4209428  
##  6      : 27707   1st Qu.:-7523967   1st Qu.:4278925  
##  7      : 27707   Median :-7472076   Median :4325256  
##  8      : 27707   Mean   :-7458441   Mean   :4328370  
##  10     : 27707   3rd Qu.:-7393313   3rd Qu.:4359541  
##  11     : 27707   Max.   :-7042122   Max.   :4640309  
##  (Other):166172                                       
##                         Landcover         MeanNDVI         Rainfall      
##  Mixed forests               :332414   Min.   :0.0007   Min.   :  5.417  
##  Evergreen needleleaf forests:     0   1st Qu.:0.4543   1st Qu.: 87.537  
##  Evergreen broadleaf forests :     0   Median :0.6578   Median :120.829  
##  Deciduous needleleaf forests:     0   Mean   :0.6269   Mean   :118.096  
##  Deciduous broadleaf forests :     0   3rd Qu.:0.7945   3rd Qu.:147.872  
##  Closed shrublands           :     0   Max.   :0.9278   Max.   :245.911  
##  (Other)                     :     0                                     
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Closed shrublands
##      Month             x                  y          
##  4      :15073   Min.   :-7692613   Min.   :4215451  
##  5      :15073   1st Qu.:-7505898   1st Qu.:4276145  
##  6      :15073   Median :-7457250   Median :4335912  
##  7      :15073   Mean   :-7408129   Mean   :4379790  
##  8      :15073   3rd Qu.:-7307137   3rd Qu.:4421625  
##  9      :15073   Max.   :-7037489   Max.   :4640309  
##  (Other):90415                                       
##                         Landcover         MeanNDVI         Rainfall      
##  Closed shrublands           :180853   Min.   :0.0004   Min.   :  5.417  
##  Evergreen needleleaf forests:     0   1st Qu.:0.4439   1st Qu.: 82.779  
##  Evergreen broadleaf forests :     0   Median :0.6216   Median :114.723  
##  Deciduous needleleaf forests:     0   Mean   :0.5977   Mean   :114.723  
##  Deciduous broadleaf forests :     0   3rd Qu.:0.7622   3rd Qu.:147.234  
##  Mixed forests               :     0   Max.   :0.9256   Max.   :245.911  
##  (Other)                     :     0                                     
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Open shrublands
##      Month             x                  y          
##  7      : 5626   Min.   :-7697246   Min.   :4201551  
##  8      : 5626   1st Qu.:-7538793   1st Qu.:4288191  
##  9      : 5626   Median :-7402579   Median :4375294  
##  11     : 5626   Mean   :-7413754   Mean   :4394426  
##  6      : 5625   3rd Qu.:-7325206   3rd Qu.:4447107  
##  10     : 5624   Max.   :-7038878   Max.   :4640309  
##  (Other):33579                                       
##                         Landcover        MeanNDVI          Rainfall      
##  Open shrublands             :67332   Min.   :0.00005   Min.   :  4.425  
##  Evergreen needleleaf forests:    0   1st Qu.:0.37605   1st Qu.: 80.820  
##  Evergreen broadleaf forests :    0   Median :0.51767   Median :110.232  
##  Deciduous needleleaf forests:    0   Mean   :0.51417   Mean   :110.697  
##  Deciduous broadleaf forests :    0   3rd Qu.:0.66760   3rd Qu.:137.161  
##  Mixed forests               :    0   Max.   :0.91800   Max.   :245.911  
##  (Other)                     :    0                                      
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Woody savannas
##      Month            x                  y          
##  7      :1031   Min.   :-7694929   Min.   :4206648  
##  8      :1031   1st Qu.:-7446594   1st Qu.:4345642  
##  6      :1030   Median :-7265438   Median :4524944  
##  11     :1025   Mean   :-7299477   Mean   :4483072  
##  9      :1022   3rd Qu.:-7160730   3rd Qu.:4603707  
##  5      :1021   Max.   :-7039342   Max.   :4640309  
##  (Other):5509                                       
##                         Landcover        MeanNDVI         Rainfall     
##  Woody savannas              :11669   Min.   :0.0001   Min.   :  4.54  
##  Evergreen needleleaf forests:    0   1st Qu.:0.2163   1st Qu.: 74.36  
##  Evergreen broadleaf forests :    0   Median :0.4122   Median :101.11  
##  Deciduous needleleaf forests:    0   Mean   :0.4116   Mean   :104.06  
##  Deciduous broadleaf forests :    0   3rd Qu.:0.6050   3rd Qu.:130.57  
##  Mixed forests               :    0   Max.   :0.8750   Max.   :245.65  
##  (Other)                     :    0                                    
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Savannas
##      Month               x                  y          
##  5      : 280522   Min.   :-7699562   Min.   :4201551  
##  7      : 280522   1st Qu.:-7390533   1st Qu.:4403092  
##  8      : 280522   Median :-7302040   Median :4483709  
##  9      : 280522   Mean   :-7318253   Mean   :4471498  
##  10     : 280522   3rd Qu.:-7221887   3rd Qu.:4553669  
##  11     : 280522   Max.   :-7038878   Max.   :4640309  
##  (Other):1677627                                       
##                         Landcover          MeanNDVI         Rainfall      
##  Savannas                    :3360759   Min.   :0.0000   Min.   :  4.425  
##  Evergreen needleleaf forests:      0   1st Qu.:0.2948   1st Qu.: 73.276  
##  Evergreen broadleaf forests :      0   Median :0.4183   Median : 96.375  
##  Deciduous needleleaf forests:      0   Mean   :0.4694   Mean   :101.708  
##  Deciduous broadleaf forests :      0   3rd Qu.:0.6449   3rd Qu.:130.371  
##  Mixed forests               :      0   Max.   :0.9517   Max.   :245.911  
##  (Other)                     :      0                                     
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Grasslands
##      Month             x                  y          
##  6      :13457   Min.   :-7694003   Min.   :4207111  
##  7      :13457   1st Qu.:-7367830   1st Qu.:4419308  
##  8      :13457   Median :-7332619   Median :4446180  
##  10     :13456   Mean   :-7305090   Mean   :4482613  
##  5      :13455   3rd Qu.:-7226520   3rd Qu.:4576835  
##  9      :13455   Max.   :-7047681   Max.   :4640309  
##  (Other):80595                                       
##                         Landcover         MeanNDVI         Rainfall     
##  Grasslands                  :161332   Min.   :0.0000   Min.   :  4.54  
##  Evergreen needleleaf forests:     0   1st Qu.:0.3342   1st Qu.: 73.10  
##  Evergreen broadleaf forests :     0   Median :0.4582   Median : 98.38  
##  Deciduous needleleaf forests:     0   Mean   :0.4502   Mean   :100.18  
##  Deciduous broadleaf forests :     0   3rd Qu.:0.5767   3rd Qu.:124.92  
##  Mixed forests               :     0   Max.   :0.8972   Max.   :245.65  
##  (Other)                     :     0                                    
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Permanent wetlands
##      Month              x                  y          
##  2      : 20235   Min.   :-7692613   Min.   :4208964  
##  4      : 20235   1st Qu.:-7492462   1st Qu.:4297920  
##  5      : 20235   Median :-7423428   Median :4356761  
##  6      : 20235   Mean   :-7392529   Mean   :4396291  
##  7      : 20235   3rd Qu.:-7288604   3rd Qu.:4483246  
##  8      : 20235   Max.   :-7038415   Max.   :4640309  
##  (Other):121397                                       
##                         Landcover         MeanNDVI          Rainfall      
##  Permanent wetlands          :242807   Min.   :0.00035   Min.   :  4.425  
##  Evergreen needleleaf forests:     0   1st Qu.:0.43520   1st Qu.: 81.455  
##  Evergreen broadleaf forests :     0   Median :0.60495   Median :107.420  
##  Deciduous needleleaf forests:     0   Mean   :0.58597   Mean   :111.931  
##  Deciduous broadleaf forests :     0   3rd Qu.:0.74050   3rd Qu.:143.565  
##  Mixed forests               :     0   Max.   :0.93100   Max.   :245.911  
##  (Other)                     :     0                                      
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Croplands
##      Month          x                  y          
##  6      :25   Min.   :-7259879   Min.   :4628262  
##  7      :25   1st Qu.:-7254782   1st Qu.:4628262  
##  8      :25   Median :-7249222   Median :4629189  
##  9      :25   Mean   :-7243718   Mean   :4631056  
##  5      :24   3rd Qu.:-7232543   3rd Qu.:4634285  
##  10     :24   Max.   :-7212621   Max.   :4639845  
##  (Other):61                                       
##                         Landcover      MeanNDVI          Rainfall     
##  Croplands                   :209   Min.   :0.00085   Min.   : 39.50  
##  Evergreen needleleaf forests:  0   1st Qu.:0.10150   1st Qu.: 60.03  
##  Evergreen broadleaf forests :  0   Median :0.19670   Median :111.70  
##  Deciduous needleleaf forests:  0   Mean   :0.21819   Mean   :110.61  
##  Deciduous broadleaf forests :  0   3rd Qu.:0.30920   3rd Qu.:142.27  
##  Mixed forests               :  0   Max.   :0.71765   Max.   :192.67  
##  (Other)                     :  0                                     
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Urban and built-up lands
##      Month            x                  y          
##  7      :1691   Min.   :-7654621   Min.   :4263172  
##  9      :1686   1st Qu.:-7254319   1st Qu.:4631042  
##  8      :1666   Median :-7245053   Median :4635212  
##  6      :1500   Mean   :-7267513   Mean   :4601537  
##  10     :1494   3rd Qu.:-7233006   3rd Qu.:4637992  
##  5      : 953   Max.   :-7146830   Max.   :4640309  
##  (Other):1300                                       
##                         Landcover        MeanNDVI          Rainfall      
##  Urban and built-up lands    :10290   Min.   :0.00000   Min.   :  6.045  
##  Evergreen needleleaf forests:    0   1st Qu.:0.01900   1st Qu.: 87.405  
##  Evergreen broadleaf forests :    0   Median :0.05443   Median :115.095  
##  Deciduous needleleaf forests:    0   Mean   :0.13592   Mean   :117.031  
##  Deciduous broadleaf forests :    0   3rd Qu.:0.20299   3rd Qu.:164.237  
##  Mixed forests               :    0   Max.   :0.97610   Max.   :231.176  
##  (Other)                     :    0                                      
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Cropland/natural vegetation mosaics
## NULL
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Snow and ice
## NULL
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Barren
## NULL
## ------------------------------------------------------------ 
## Landcover_NDVI_Rainfall_filter$Landcover: Water bodies
## NULL
sum(is.na(Landcover_NDVI_Rainfall_filter))
## [1] 0

Summary of NDVI Dataset

Below is a computation of the basic statistics(minimum, maximum, mean, median) of all numeric variables found in NDVI dataset

library(skimr)
## 
## Attaching package: 'skimr'
## The following object is masked from 'package:raster':
## 
##     bind
skim(NDVI_timeseries_Month)
Data summary
Name NDVI_timeseries_Month
Number of rows 230
Number of columns 5
_______________________
Column type frequency:
Date 1
numeric 4
________________________
Group variables None

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
YearMonth 0 1 2013-01-01 2022-12-01 2017-12-16 120

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Year 0 1 2017.50 2.88 2013 2015.00 2017.50 2020.00 2022.00 ▇▇▇▇▇
JulianDay 0 1 177.00 106.36 1 81.00 177.00 273.00 353.00 ▇▆▇▆▇
NDVI 0 1 0.46 0.22 0 0.32 0.42 0.59 0.87 ▁▇▇▂▅
Month 0 1 6.39 3.49 1 3.00 6.00 10.00 12.00 ▇▅▅▃▇
sum(is.na(NDVI_timeseries_Month))
## [1] 0

Exploratory Data Analysis

For the purpose of this project, there will be a conduct of data analysis and data visualization which includes Spatial mapping NDVI and land cover/land use in 2019. The following will be conducted on the dataset;

  • Map the average NDVI in 2019 or a specific date NDVI in 2019.

  • Describe the spatial distributions of NDVI and land cover/land use

Spatial Mapping

# Read the shapefile data for IN boundary
IN<-st_read("Indiana.shp")
## Reading layer `Indiana' from data source 
##   `C:\Users\charl\OneDrive\Desktop\DSCI 607 Data Analytics for Environmental Sciences\Indiana.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 14 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.09789 ymin: 37.77173 xmax: -84.78459 ymax: 41.76137
## Geodetic CRS:  NAD83
# Reading in the downloaded land cover and NDVI raster data at 2019
IGBP_raster <- raster("MCD12Q1_LC1_2019_001.tif")
NDVI_raster <- raster("MOD13A1_NDVI_2019_209.tif")
## Transforming data
# Transfer the project of the IN polygon with the same projection with the raster images
crs(IGBP_raster) # cordinate reference system
## Coordinate Reference System:
## Deprecated Proj.4 representation:
##  +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs 
## WKT2 2019 representation:
## PROJCRS["unknown",
##     BASEGEOGCRS["unknown",
##         DATUM["unknown",
##             ELLIPSOID["unknown",6371007.181,0,
##                 LENGTHUNIT["metre",1,
##                     ID["EPSG",9001]]]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8901]]],
##     CONVERSION["unknown",
##         METHOD["Sinusoidal"],
##         PARAMETER["Longitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["False easting",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1,
##                 ID["EPSG",9001]]]]
crs(IN)
## [1] "GEOGCRS[\"NAD83\",\n    DATUM[\"North American Datum 1983\",\n        ELLIPSOID[\"GRS 1980\",6378137,298.257222101,\n            LENGTHUNIT[\"metre\",1]]],\n    PRIMEM[\"Greenwich\",0,\n        ANGLEUNIT[\"degree\",0.0174532925199433]],\n    CS[ellipsoidal,2],\n        AXIS[\"geodetic latitude (Lat)\",north,\n            ORDER[1],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        AXIS[\"geodetic longitude (Lon)\",east,\n            ORDER[2],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n    ID[\"EPSG\",4269]]"
IN_pro <- st_transform(IN,crs(IGBP_raster))
## Cropping raster data
IGBP_raster_IN <- mask(IGBP_raster,IN_pro)  ##use crop() if necessary
NDVI_raster_IN <- mask(NDVI_raster,IN_pro)
## Visualize the raster images
plot(IN_pro) # This will create a simple plot of the spatial data
## Warning: plotting the first 9 out of 14 attributes; use max.plot = 14 to plot
## all

## plot of the used raster image files
plot(IGBP_raster_IN,main = "Raster File")

plot(NDVI_raster_IN,main = "Raster File")

#### data exploration 
# Creating a spatial map using ggplot2 
ggplot() +
    geom_sf(data = IN, aes(fill = IN$REGION)) +
    theme_minimal() +
    labs(title = "Spatial Data Map", fill = "REGION")
## Warning: Use of `IN$REGION` is discouraged.
## ℹ Use `REGION` instead.

### below is the visualization of the region of the state of indiana
##Creating dataframe for the raster files for the purpose further visualization
rast_df1 <- as.data.frame(IGBP_raster, xy = TRUE)
rast_df2 <- as.data.frame(NDVI_raster, xy = TRUE)
### checking the unique month levels found in Landcover dataset
months <- unique(Landcover_NDVI_Rainfall_filter$Month)
months
##  [1] 1  2  3  4  5  6  7  8  9  10 11 12
## Levels: 1 2 3 4 5 6 7 8 9 10 11 12
## visualization of landcover

Landcover_NDVI_Rainfall_filter %>% filter(Month==months[1]) %>%
  ggplot() +
  geom_point(aes(x = x, y = y, fill = MeanNDVI, col = MeanNDVI))

# Visualising using ggplot2
ggplot() + 
  geom_raster(data =Landcover_NDVI_Rainfall_filter,
              aes(x = x, y = y, fill = Landcover)) +
  geom_sf(data = IN_pro, inherit.aes = FALSE, fill = NA) +
  scale_fill_viridis(name = "Land Cover Type", discrete = TRUE) +
  labs(title = "Land Cover classification",
       subtitle = "01-01-2019 - 31-12-2019",
       x = "Longitude",
       y = "Latitude") +
  theme_minimal()
## Warning: Raster pixels are placed at uneven horizontal intervals and will be shifted
## ℹ Consider using `geom_tile()` instead.

Visualization based on Seasons- Winter, Spring, Summer, Fall

## visualization based on seasons- Winter, Spring, Summer, Fall
Landcover_NDVI_Rainfall_filter$season <- ifelse(Landcover_NDVI_Rainfall_filter$Month %in% c('1', '2', '3'), 'Winter',
                    ifelse(Landcover_NDVI_Rainfall_filter$Month %in% c('4', '5', '6'), 'Spring', 
                           ifelse(Landcover_NDVI_Rainfall_filter$Month %in% c('7', '8', '9'), 'Summer', 'Fall')))
Landcover_NDVI_Rainfall_filter$season <- as.factor(Landcover_NDVI_Rainfall_filter$season)
Landcover_NDVI_Rainfall_filter %>% filter(season == 'Fall') %>% ggplot(aes(x = x, y = y, fill = Rainfall)) + 
  geom_tile()+ theme_minimal()+ 
  scale_fill_viridis_c(option = 'viridis') + labs(title = 'Rainfall in Indiana', subtitle = 'Fall 2019') + xlab("") + ylab("") 

Landcover_NDVI_Rainfall_filter %>% filter(season == 'Spring') %>% ggplot(aes(x = x, y = y, fill = Rainfall)) + geom_tile()+ theme_minimal()+ 
  scale_fill_viridis_c(option = 'viridis') + labs(title = 'Rainfall in Indiana', subtitle = 'Spring 2019') + xlab("") + ylab("") 

Landcover_NDVI_Rainfall_filter %>% filter(season == 'Winter') %>% ggplot(aes(x = x, y = y, fill = Rainfall)) + geom_tile()+ theme_minimal()+ 
  scale_fill_viridis_c(option = 'viridis')+ labs(title = 'Rainfall in Indiana', subtitle = 'Winter 2019') + xlab("") + ylab("") 

Landcover_NDVI_Rainfall_filter %>% filter(season == 'Summer') %>% ggplot(aes(x = x, y = y, fill = Rainfall)) + geom_tile()+ theme_minimal() + 
  scale_fill_viridis_c(option = 'viridis')+ labs(title = 'Rainfall in Indiana', subtitle = 'Summer 2019') + xlab("") + ylab("") 

## changing month to Jan, Feb etc.
df <- NDVI_timeseries_Month
df$Month <- month.abb[df$Month]
###### MeanNDVI ###########
#visualization of the meanndvi variable
ggplot(Landcover_NDVI_Rainfall_filter, aes(x=MeanNDVI)) +
  geom_histogram(position= "dodge", alpha=0.5, binwidth = 0.005, color= "blue",fill="green")+
  labs(title= "MeanNDVI Distribution",x= "MeanNDVI values",y= "count")+
  theme_classic()

## geom_histogram is the geometric function
## Labs allow us to set the title
## theme_ classic suggest the basic theme for the plotting area
###### NDVI ###########
## boxplot indicating the vegetation index which suggest an increase of the vegetation index from Jan to Jun and a decline from Aug to Dec.
ggplot(df, aes(x = Month, y = NDVI)) + geom_boxplot() + theme_minimal()+ scale_x_discrete(limits = month.abb)

##boxplot of meanNDVI against Landcover types
ggplot(Landcover_NDVI_Rainfall_filter, aes(x = Landcover, y = MeanNDVI)) + geom_boxplot() + theme_minimal()

## Anova Analysis

Below is ANOVA analysis of NDVI with month and land cover types using data in 2019.

## group of landcover types agianst the summary of meanNDVI
Landcover_NDVI_Rainfall_filter %>% group_by(Landcover) %>% summarise(mean(MeanNDVI))
## # A tibble: 13 × 2
##    Landcover                    `mean(MeanNDVI)`
##    <fct>                                   <dbl>
##  1 Evergreen needleleaf forests            0.496
##  2 Evergreen broadleaf forests             0.531
##  3 Deciduous needleleaf forests            0.665
##  4 Deciduous broadleaf forests             0.670
##  5 Mixed forests                           0.627
##  6 Closed shrublands                       0.598
##  7 Open shrublands                         0.514
##  8 Woody savannas                          0.412
##  9 Savannas                                0.469
## 10 Grasslands                              0.450
## 11 Permanent wetlands                      0.586
## 12 Croplands                               0.218
## 13 Urban and built-up lands                0.136
##  Two way Anova Test
two_way <- aov(MeanNDVI ~ Month + Landcover, data= Landcover_NDVI_Rainfall_filter)
summary(two_way)
##                  Df Sum Sq Mean Sq F value Pr(>F)    
## Month            11 160083   14553 1549451 <2e-16 ***
## Landcover        12  28138    2345  249650 <2e-16 ***
## Residuals   4857089  45619       0                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can now check normality

par(mfrow = c(1, 2)) # combine plots

# histogram
hist(two_way$residuals)

# QQ-plot
library(car)
## Warning: package 'car' was built under R version 4.3.3
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
qqPlot(two_way$residuals
)

## [1] 1116542 1089817

From the histogram and QQ-plot above, we can observe that the normality assumption seems to be met. As shown above, the histogram roughly form a bell curve, indicating that the residuals follow a normal distribution. Furthermore, points in the QQ-plots roughly follow the straight line and most of them are within the confidence bands, also indicating that residuals follow approximately a normal distribution.

Ordinary linear regression analysis of NDVI with precipitation

model <- lm(MeanNDVI ~ Rainfall, data = Landcover_NDVI_Rainfall_filter)
model
## 
## Call:
## lm(formula = MeanNDVI ~ Rainfall, data = Landcover_NDVI_Rainfall_filter)
## 
## Coefficients:
## (Intercept)     Rainfall  
##   0.4611341    0.0004597
summary(model)
## 
## Call:
## lm(formula = MeanNDVI ~ Rainfall, data = Landcover_NDVI_Rainfall_filter)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56272 -0.17996 -0.04201  0.19264  0.47835 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.611e-01  2.623e-04  1758.0   <2e-16 ***
## Rainfall    4.597e-04  2.301e-06   199.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2185 on 4857111 degrees of freedom
## Multiple R-squared:  0.00815,    Adjusted R-squared:  0.00815 
## F-statistic: 3.991e+04 on 1 and 4857111 DF,  p-value: < 2.2e-16
### model coefficients
coefficients(model)
##  (Intercept)     Rainfall 
## 0.4611341166 0.0004596665

Slope and Intercept Plot

ggplot(Landcover_NDVI_Rainfall_filter, aes(Rainfall, MeanNDVI)) +
  geom_point() +
  geom_abline(intercept = model$coefficients[1], 
              slope = model$coefficients[2], 
              color = "red")

Results Interpretation

The output table above in the first place reports the formula that was used to generate the results(call), the summary of the model residuals and finally the coefficient and intercept of the results. The y-intercept of the linear regression equation is reported as 4.611e-01. The coefficients describes the estimated effect of Rainfall on NDVI. There is also a report on the residual standard error, R-square, f-statistics and p-value. The p-value reported for the model is 2.2e-16

Because the p value is so low (p-value: < 2.2e-16), we can reject the null hypothesis and conclude that Rainfall has a statistically significant effect on NDVI.

Time series analysis with NDVI time series from 2013-2022 over croplands

Time series analysis with NDVI time series from 2013-2022 over croplands

  • Plot and decompose time series of NDVI

  • Split the data into two parts: training data: 2013-2020; test data: 2021-2022

  • Model the time series analysis and forecasting with the training data and valid it with the test data

  • Interpret your time series analysis

library(zoo)
## 
## Attaching package: 'zoo'
## The following object is masked from 'package:terra':
## 
##     time<-
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(padr)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:raster':
## 
##     select
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(knitr)
## 
## Attaching package: 'knitr'
## The following object is masked from 'package:terra':
## 
##     spin
library(tseries)
### Unique dates 
dates <- unique(NDVI_timeseries_Month$YearMonth)
dates
##   [1] "2013-01-01" "2013-02-01" "2013-03-01" "2013-04-01" "2013-05-01"
##   [6] "2013-06-01" "2013-07-01" "2013-08-01" "2013-09-01" "2013-10-01"
##  [11] "2013-11-01" "2013-12-01" "2014-01-01" "2014-02-01" "2014-03-01"
##  [16] "2014-04-01" "2014-05-01" "2014-06-01" "2014-07-01" "2014-08-01"
##  [21] "2014-09-01" "2014-10-01" "2014-11-01" "2014-12-01" "2015-01-01"
##  [26] "2015-02-01" "2015-03-01" "2015-04-01" "2015-05-01" "2015-06-01"
##  [31] "2015-07-01" "2015-08-01" "2015-09-01" "2015-10-01" "2015-11-01"
##  [36] "2015-12-01" "2016-01-01" "2016-02-01" "2016-03-01" "2016-04-01"
##  [41] "2016-05-01" "2016-06-01" "2016-07-01" "2016-08-01" "2016-09-01"
##  [46] "2016-10-01" "2016-11-01" "2016-12-01" "2017-01-01" "2017-02-01"
##  [51] "2017-03-01" "2017-04-01" "2017-05-01" "2017-06-01" "2017-07-01"
##  [56] "2017-08-01" "2017-09-01" "2017-10-01" "2017-11-01" "2017-12-01"
##  [61] "2018-01-01" "2018-02-01" "2018-03-01" "2018-04-01" "2018-05-01"
##  [66] "2018-06-01" "2018-07-01" "2018-08-01" "2018-09-01" "2018-10-01"
##  [71] "2018-11-01" "2018-12-01" "2019-01-01" "2019-02-01" "2019-03-01"
##  [76] "2019-04-01" "2019-05-01" "2019-06-01" "2019-07-01" "2019-08-01"
##  [81] "2019-09-01" "2019-10-01" "2019-11-01" "2019-12-01" "2020-01-01"
##  [86] "2020-02-01" "2020-03-01" "2020-04-01" "2020-05-01" "2020-06-01"
##  [91] "2020-07-01" "2020-08-01" "2020-09-01" "2020-10-01" "2020-11-01"
##  [96] "2020-12-01" "2021-01-01" "2021-02-01" "2021-03-01" "2021-04-01"
## [101] "2021-05-01" "2021-06-01" "2021-07-01" "2021-08-01" "2021-09-01"
## [106] "2021-10-01" "2021-11-01" "2021-12-01" "2022-01-01" "2022-02-01"
## [111] "2022-03-01" "2022-04-01" "2022-05-01" "2022-06-01" "2022-07-01"
## [116] "2022-08-01" "2022-09-01" "2022-10-01" "2022-11-01" "2022-12-01"

Check for missing values

# 2. Check for missing values
missing_values <- sum(is.na(NDVI_timeseries_Month))
if (missing_values > 0) {
  print(paste("Number of missing values:", missing_values))
} else {
  print("No missing values found.")
}
## [1] "No missing values found."

Check for invalid NDVI values

invalid_ndvi <- sum(NDVI_timeseries_Month$NDVI < -1 | NDVI_timeseries_Month$NDVI > 1)
if (invalid_ndvi > 0) {
  print(paste("Number of invalid NDVI values:", invalid_ndvi))
  # You can take appropriate actions, such as removing or imputing invalid values
} else {
  print("No invalid NDVI values found.")
}
## [1] "No invalid NDVI values found."

Average all points

library(dplyr)
station1 <- NDVI_timeseries_Month %>%
  group_by(YearMonth) %>%
  summarise(NDVI = mean(NDVI))
## first 10 rows of the NDVI data
kable(head(station1,10),caption= "The first 10 rows of the  NDVI data")
The first 10 rows of the NDVI data
YearMonth NDVI
2013-01-01 0.2311683
2013-02-01 0.3012603
2013-03-01 0.2946832
2013-04-01 0.3971307
2013-05-01 0.4623701
2013-06-01 0.6359540
2013-07-01 0.8445031
2013-08-01 0.8020901
2013-09-01 0.6559551
2013-10-01 0.4593558
# Convert YearMonth field from character to Date in date time format
station1$YearMonth<-as_datetime(station1$YearMonth)
###Choose three-year data
SiteNDVI_10yrs<-station1 %>% 
  select(YearMonth,NDVI) %>%  #only select variable we want
  filter(YearMonth>="2013-01-01" & YearMonth< "2022-12-31")
# Check the first of the time frame
head(SiteNDVI_10yrs,5)
## # A tibble: 5 × 2
##   YearMonth            NDVI
##   <dttm>              <dbl>
## 1 2013-02-01 00:00:00 0.301
## 2 2013-03-01 00:00:00 0.295
## 3 2013-04-01 00:00:00 0.397
## 4 2013-05-01 00:00:00 0.462
## 5 2013-06-01 00:00:00 0.636
# Check The End of the Time Frame
tail(SiteNDVI_10yrs,5)
## # A tibble: 5 × 2
##   YearMonth            NDVI
##   <dttm>              <dbl>
## 1 2022-08-01 00:00:00 0.847
## 2 2022-09-01 00:00:00 0.732
## 3 2022-10-01 00:00:00 0.458
## 4 2022-11-01 00:00:00 0.327
## 5 2022-12-01 00:00:00 0.246
# find invalid NDVI
InvalidNDVI <- SiteNDVI_10yrs%>% filter(NDVI<0)
print(InvalidNDVI)
## # A tibble: 0 × 2
## # ℹ 2 variables: YearMonth <dttm>, NDVI <dbl>
#### Original 10-year data
ggplot(station1,aes(x=YearMonth,y=NDVI))+geom_line()

#### Use ggplot() for time series data visualization to any two-year data in the 10-year time range of 2013-01-01 to 2022-12-31.After subsetting
ggplot(SiteNDVI_10yrs,aes(x=YearMonth,y=NDVI))+geom_line()

Split the data into two parts: training data: 2013-2020; test data: 2021-2022

Split the data from 2013-01-01 to 2023-12-31(11-year data) into two data sets.

Training data: 2013-01-01 to 2020-12-31.

Test data: 2021-01-01 to 2022-12-31.

### 8 years
SiteNDVI_8yrs_train<-SiteNDVI_10yrs %>%
  filter(YearMonth>="2013-01-01" & YearMonth < "2020-12-31") 

### 3 years
SiteNDVI_3yrs_test<-SiteNDVI_10yrs %>%
  filter(YearMonth>="2021-01-01" & YearMonth < "2022-12-31") 
###Time series visualization using ggplot()
NDVI_plot <- SiteNDVI_8yrs_train %>%
  ggplot(aes(x = YearMonth, y = NDVI)) +
  geom_line(aes(color = "NDVI")) +
  scale_x_datetime(name = "YearMonth", date_breaks = "2 year") +
  scale_y_continuous(breaks = seq(0, 1, 0.2)) +
  theme_minimal() +
  labs(title = "Indiana NDIV over 2021", subtitle = "2020-1-1 - 2022-12-31")

ggplotly(NDVI_plot)

Convert training data and test data at 8-day scale to monthly scale

We create a seasonal time series (msts), decompose the msts, visualize the decomposed multi-time series and then save the msts to a data frame object.

SiteNDVI_8yrs_train_monthly <- SiteNDVI_8yrs_train %>% 
  mutate(month = month(as.Date(YearMonth)), year = year(as.Date(YearMonth))) %>% 
  group_by(year, month) %>%
  summarise(mean_NDVI = mean(NDVI, na.rm = T), .groups = "keep") %>%
  as.data.frame()

SiteNDVI_8yrs_train_monthly <- SiteNDVI_8yrs_train_monthly[1:96,]



SiteNDVI_3yrs_test_monthly <- SiteNDVI_3yrs_test %>% 
  mutate(month = month(as.Date(YearMonth)), year = year(as.Date(YearMonth))) %>% 
  group_by(year, month) %>%
  summarise(mean_NDVI = mean(NDVI, na.rm = T), .groups = "keep") %>%
  as.data.frame()

Decompose the monthly training data you using both of stlm() and mstl() functions.

##Create the time series object using ts()
NDVI_ts_train<-ts(SiteNDVI_8yrs_train_monthly$mean_NDVI,start = c(2013,1), end = c(2022,12),frequency = 12)
plot(NDVI_ts_train)

##Create the time series object using msts()
NDVI_multi0 <-msts(SiteNDVI_8yrs_train_monthly$mean_NDVI,seasonal.periods = c(12)) #monthly
plot(NDVI_multi0)

Decompose the time series data

library(zoo)
NDVI_ts_train_dec<-decompose(na.StructTS(NDVI_ts_train))%>%
  plot()

The plot above shows the original time series (top), the estimated trend component (second from top), the estimated seasonal component (third from top), and the estimated random component (bottom). From the above it can be observed that the estimated trend shows a small decrease in 2018-19, which is followed by a steady increase from 2021 and rapid fall into 2022. There is a further decline in the trend as seen in the 2022 towards 2023.

NDVI_ts_train_stl1 <- stlm(NDVI_ts_train, s.window = 'periodic')
## Warning in ets(x, model = etsmodel, allow.multiplicative.trend =
## allow.multiplicative.trend, : Missing values encountered. Using longest
## contiguous portion of time series
# NDVI_ts_train_stl1 <- stlm(NDVI_ts_train, s.window = c(12))
# plot(NDVI_ts_train_stl1)  # cannot use plot() here


#Method2: mstl(): Multiple seasonal decomposition: daily, weekly , monthly. 
# Use x value
NDVI_multi0 <-msts(SiteNDVI_8yrs_train_monthly$mean_NDVI,seasonal.periods = c(12)) #monthly
NDVI_multi0%>%
  mstl() %>% 
  autoplot()

# use time series (ts) object
NDVI_multi <-msts(NDVI_ts_train,seasonal.periods = c(12)) #monthly
NDVI_multi%>%
  mstl() %>% 
  autoplot()

Interpret the results of your Seasonality Analysis on the training data. For example, how NDVI changes along with time.

From the output we can clearly see the seasonal component and separated upward trend of the data.

#Method 1: moving average,Moving-average smoothing
par(mfrow=c(1,1))
ts.plot(NDVI_ts_train)
sm <- ma(NDVI_ts_train, order=12) # 12 month, moving average,Moving-average smoothing
lines(sm, col="blue") # plot

#Method 2: Simple exponential smoothing: Level Only
model <- hw(NDVI_ts_train, initial = "optimal", h=(12), beta=NULL, gamma=NULL) # h is the no. periods to forecast
## Warning in ets(x, "AAA", alpha = alpha, beta = beta, gamma = gamma, phi = phi,
## : Missing values encountered. Using longest contiguous portion of time series
#Method 3:Double Exponential smoothing: Level and Trend components
model <- hw(NDVI_ts_train, initial = "optimal", h=(12), gamma=NULL)
## Warning in ets(x, "AAA", alpha = alpha, beta = beta, gamma = gamma, phi = phi,
## : Missing values encountered. Using longest contiguous portion of time series
#Method 4: Holt Winters: Level, Trend and Seasonality
model <- hw(NDVI_ts_train, initial = "optimal", h=(12))#12 define the forecast Period Length
## Warning in ets(x, "AAA", alpha = alpha, beta = beta, gamma = gamma, phi = phi,
## : Missing values encountered. Using longest contiguous portion of time series
plot(model)

accuracy(model) # calculate accuracy measures
##                         ME      RMSE        MAE       MPE     MAPE      MASE
## Training set -0.0005086566 0.0466827 0.03406028 -5.261922 13.40567 0.7002557
##                   ACF1
## Training set 0.4210418

Check the stationarity

If both ACF and PACF plots show significant autocorrelation only at lag 0 (i.e., the first value) and decay quickly afterward, it suggests the time series is likely stationary. If the ACF plot shows a slow decay and/or the PACF plot has significant correlations at multiple lags, it might indicate non-stationary. If there is a sinusoidal pattern or gradual decay in the ACF and PACF plots, it might indicate seasonality, suggesting non-stationary.

acf(NDVI_ts_train,lag.max=60,xaxt="n", xlab="Lag (months)", na.action = na.pass)

pacf(NDVI_ts_train,lag.max=60,xaxt="n", xlab="Lag (months)", na.action = na.pass)

Arima: Models and Forecasting

Step 6.1: Non-Seasonal ARIMA

########################################1. None seasonal ARIMA-remove seasonality and then focus on the data  to removed seasonality
NDVI_ts_train0<-ts(NDVI_ts_train,frequency = 12)
AR0 <- Arima(NDVI_ts_train0, order = c(1,0,0)) #AR(1)

Step 6.2: Seasonal ARIMA

########################################2. Seasonal ARIMA-with seasonality
AR <- Arima(NDVI_ts_train, order = c(2,0,2),seasonal=c(1,1,1)) 
#AR <- Arima(NDVI_ts_train, order = c(1,0,3),seasonal=c(0,1,1)) 
#AR <- Arima(NDVI_ts_train, order = c(2,0,2),seasonal=c(0,0,0)) #non-season arima

#plotting the NDVI_ts_train series plus the forecast and 95% prediction intervals
ts.plot(NDVI_ts_train,xlim=c(2013,2024))
AR_forecast <- predict(AR, n.ahead = 24)$pred
AR_forecast_se <- predict(AR, n.ahead = 24)$se
points(AR_forecast, type = "l", col = 2,lwd = 3)
points(AR_forecast - 2*AR_forecast_se, type = "l", col = 2, lty = 2,lwd=2)
points(AR_forecast + 2*AR_forecast_se, type = "l", col= 2, lty = 2,lwd = 2)

## check for residuals
checkresiduals(AR)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,0,2)(1,1,1)[12]
## Q* = 19.434, df = 18, p-value = 0.3656
## 
## Model df: 6.   Total lags used: 24
## residuals bell shaped

Step 6.3: Auto ARIMA

########################################3. Auto ARIMA:produce the same result as the method2
AutoArimaModel=auto.arima(NDVI_ts_train)
AutoArimaModel   #ARIMA(2,0,2)(1,1,1)[12]
## Series: NDVI_ts_train 
## ARIMA(1,0,0)(2,1,0)[12] 
## 
## Coefficients:
##          ar1     sar1     sar2
##       0.4843  -0.5576  -0.4961
## s.e.  0.0875   0.0974   0.0877
## 
## sigma^2 = 0.002569:  log likelihood = 164.17
## AIC=-320.34   AICc=-319.95   BIC=-309.61
############predict
ts.plot(NDVI_ts_train,xlim=c(2013,2024))
AR_prediction <- predict(AutoArimaModel, n.ahead = 24)$pred
AR_prediction_se <- predict(AutoArimaModel, n.ahead = 24)$se
points(AR_prediction, type = "l", col = 2,lwd = 3)
points(AR_prediction - 2*AR_prediction_se, type = "l", col = 2, lty = 2,lwd=2)
points(AR_prediction + 2*AR_prediction_se, type = "l", col= 2, lty = 2,lwd = 2)

############forecast
AR_forecast<-forecast(AutoArimaModel, h=12*2)
NDVI_ts_train%>%
  autoplot() +
  autolayer(AR_forecast,series = "Multi-Seasonal ARIMA",color = "purple")

Step 6.4:A model with mstl() object

###mlst object
NDVI_multi <-msts(NDVI_ts_train,seasonal.periods = c(12)) #monthly
NDVI_multi_ts <- NDVI_multi%>%
  mstl()

autoplot(NDVI_multi_ts)

f_arima1<-forecast(NDVI_multi_ts, h=12*2)  #ETS(M,N,N) 
NDVI_ts_train%>%
  autoplot() +
  autolayer(f_arima1,series = "Multi-Seasonal Holt_Winter",color = "purple")

Step 6.5: Models with stlm() object

# stlm object: multi seasonal ARIMA model, using arima model,ARIMA(0,0,3)
NDVI_stlm<-stlm(NDVI_ts_train, method = "arima")
f_arima2<-forecast(NDVI_stlm, h=12*2) #ARIMA(0,0,3)
NDVI_ts_train%>%
  autoplot() +
  autolayer(f_arima2,series = "Multi-Seasonal ARIMA",color = "purple")

7. Compare models and validate models

Step 7.1: Compare AIC and BIC values and different errors

The measures calculated are:

By default, the MASE calculation is scaled using MAE of training set naive forecasts for non-seasonal time series, training set seasonal naive forecasts for seasonal time series and training set mean forecasts for non-time series data. If f is a numerical vector rather than a forecast object, the MASE will not be returned as the training data.

AIC_BIC <- rbind(c(f_arima1$model$aic,f_arima1$model$bic),c(f_arima2$model$aic,f_arima2$model$bic),c(AutoArimaModel$aic,AutoArimaModel$bic),c(AR$aic,AR$bic))
colnames(AIC_BIC) <- c("AIC","BIC")
rownames(AIC_BIC) <- c("mlst","stlm_arima","auto_arima","AR")
accuracy(f_arima1)
##                         ME       RMSE        MAE       MPE     MAPE     MASE
## Training set -0.0008121236 0.04483284 0.03524469 -4.197212 12.69528 0.741506
##                    ACF1
## Training set 0.07517558
Accuracy<-rbind(accuracy(f_arima1),accuracy(f_arima2),accuracy(AutoArimaModel),accuracy(AR))

rownames(Accuracy) <- c("mlst","stlm_arima","auto_arima","AR")

kable(Accuracy,caption= "Errors (Accuracy) comparison of 4 models")
Errors (Accuracy) comparison of 4 models
ME RMSE MAE MPE MAPE MASE ACF1
mlst -0.0008121 0.0448328 0.0352447 -4.197212 12.69528 0.7415060 0.0751756
stlm_arima -0.0005982 0.0402946 0.0305436 -5.088403 12.46081 0.6426015 0.0198871
auto_arima -0.0003873 0.0473815 0.0343624 -4.876132 13.41552 0.7229436 -0.0500050
AR 0.0005659 0.0406224 0.0293476 -3.891634 11.59142 0.6174383 -0.0923885
kable(AIC_BIC,caption= "AIC and BIC values comparision of four models")
AIC and BIC values comparision of four models
AIC BIC
mlst -165.6161 -157.2537
stlm_arima -420.1034 -411.7660
auto_arima -320.3402 -309.6489
AR -328.8271 -310.1173

Valid the model(s) with test data

NDVI_prediction <- as.data.frame(f_arima1$mean)
predicted_NDVI <- as.numeric(NDVI_prediction$x)

observed_NDVI <- SiteNDVI_3yrs_test_monthly$mean_NDVI
max1<-max(observed_NDVI)
max2<-max(predicted_NDVI)
plot(observed_NDVI,predicted_NDVI[1:length(observed_NDVI)],xlim=c(0,max(max1,max2)),ylim=c(0,max(max1, max2)), xlab="Observed NDVI" , ylab="Predicted NDVI" , col="red", pch=5)
lines(c(0,max(max1, max2)),c(0,max(max1, max2)),col="black",lwd=3,lty=1)
legend(95,40,c('','1:1Line'),lty=c(NA,1),pch=c(NA,NA),lwd=c(NA,3),bg='white',ncol=1,box.lty=0)

lmMod <- lm(observed_NDVI ~ predicted_NDVI[1:length(observed_NDVI)]) 
summary (lmMod)
## 
## Call:
## lm(formula = observed_NDVI ~ predicted_NDVI[1:length(observed_NDVI)])
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.101633 -0.017430  0.007956  0.023209  0.084237 
## 
## Coefficients:
##                                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)                             -0.00193    0.02495  -0.077    0.939
## predicted_NDVI[1:length(observed_NDVI)]  1.02747    0.04821  21.312 1.05e-15
##                                            
## (Intercept)                                
## predicted_NDVI[1:length(observed_NDVI)] ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04699 on 21 degrees of freedom
## Multiple R-squared:  0.9558, Adjusted R-squared:  0.9537 
## F-statistic: 454.2 on 1 and 21 DF,  p-value: 1.049e-15